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HIGHLIGHTS 


• Adaptive statistical background subtraction model. 

• Robust detection of non-uniform and variable thermal profile targets likes vehicles. 

• State of the art performance on a diverse dataset of thermal infrared sequences. 

• High detection rate with low false alarms. 
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A robust contour-based statistical background subtraction method for detection of non-uniform thermal 
targets in infrared imagery is presented. The foremost step of the method comprises of generation of 
background frame using statistical information of an initial set of frames not containing any targets. 
The generated background frame is made adaptive by continuously updating the background using the 
motion information of the scene. The background subtraction method followed by a clutter rejection 
stage ensure the detection of foreground objects. The next step comprises of detection of contours and 
distinguishing the target boundaries from the noisy background. This is achieved by using the Canny edge 
detector that extracts the contours followed by a k-means clustering approach to differentiate the object 
contour from the background contours. The post processing step comprises of morphological edge linking 
approach to close any broken contours and finally flood fill is performed to generate the silhouettes of 
moving targets. This method is validated on infrared video data consisting of a variety of moving targets. 
Experimental results demonstrate a high detection rate with minimal false alarms establishing the 
robustness of the proposed method. 

© 2013 Elsevier B.V. All rights reserved. 


1. Introduction 

Moving target detection is an active research area in Computer 
Vision whose territory spans across various applications like hu¬ 
man identification, robotics, surveillance and perimeter monitor¬ 
ing systems [1]. Target detection refers to the task of 
determining whether or not an object is present in the scene, 
and if present, identifying its location and size, and extracting it 
from the background [2]. The criticality of the applications demand 
persistent and ubiquitous detection of targets. Focusing on robust¬ 
ness, omnipresence and 24 x 7 applicability, the long wave infra¬ 
red region of Electromagnetic spectrum is explored. Thermal 
infrared cameras detect the amount of thermal energy that is 
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emitted by all objects with temperature above absolute zero. As 
long as there is some difference in the thermal properties of fore¬ 
ground and background, the subsequent region appears in contrast 
from the background, making thermal cameras capable for detec¬ 
tion in both day and night time [2]. By using thermal imaging, 
problems like soft shadows, sudden illumination changes, lack of 
visibility caused due to harsh environmental conditions and night 
time can be avoided. However, thermal imaging imposes certain 
challenges like low signal-to-noise, non-repeatability of target sig¬ 
nature, competing background clutter, lack of a priori information, 
and weather induced artefacts [3]. As this paper describes target 
detection for vehicles, the major challenge is that the thermal sig¬ 
nature of vehicles shows a high degree of variability. It is observed 
that the thermal signature of the same vehicle is different at differ¬ 
ent time of the day. Moreover thermal profile of vehicles varies 
from one part of vehicle to another. The wheels of vehicles have 
different thermal profile when compared to the metal body. The 
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area near the engine is hotter than doors and the windscreen also 
has a different thermal profile. It is also observed that the thermal 
profile of natural objects like humans, trees, etc. is uniform as com¬ 
pared to artificial objects. Another challenge of infrared images is 
the lack of prominent edge and boundary information in thermal 
imagery, making detection and segmentation of the object from 
the image a very complex task. 

Addressing these challenges an adaptive contour-based statisti¬ 
cal background subtraction technique is proposed for detection of 
moving targets in infrared images. The method is designed to han¬ 
dle the challenges imposed for detection of moving vehicles, how¬ 
ever it is also equally capable of detecting humans in thermal 
imagery. Moreover, the proposed method does not depend on 
any a priori information of shape or motion for detection. The pro¬ 
posed method has been tested on six challenging infrared videos 
from the available “CSIR-CSIO Moving Object Thermal Dataset” [3]. 


2. Related work 

After the image acquisition, the first step in recognition applica¬ 
tions is to partition objects of interest generally the foreground 
from the background, and largely the objects of interest are usually 
in motion. Hence, motion plays an important role in target detec¬ 
tion. Various shapes, different orientation of motions, non-rigid 
structure occlusions, halos and sudden illumination changes pose 
significant challenges in the extraction of foreground objects from 
the background of an image. There exist several approaches in the 
literature to detect targets and tackle the issues discussed above. 
These can be classified into two broad categories training based ap¬ 
proaches and others which do not require any training. 

Some of the detection approaches that require training are 
examined using wavelet template [4] and binary template match¬ 
ing [5]. Color and texture invariant wavelet template, followed by 
Support Vector Machine (SVM) classifiers is proposed in [4]. In [6] 
a shape based hierarchical approach using color and gradient infor¬ 
mation based on pixel, region and frame level information and 
coarse-to-fine edge matching has been discussed which is robust 
to shifting of static background objects, illumination changes, 
and initialization of background model. An approach for detecting 
objects based on motion pattern, appearance and trained using 
AdaBoost to take advantage of both motion and appearance infor¬ 
mation is proposed in [7]. A two-step detection/tracking method is 
proposed in [8] where the detection phase is performed by SVM 
and the tracking phase is a combination of Kalman filter prediction 
and mean shift tracking. A novel approach for detecting and track¬ 
ing objects using frame level partitioning and finding correlation is 
examined in 9] which do not require training. An approach of ob¬ 
ject detection that does not require large training sets, and also 
provides the silhouettes of the detected objects is background-sub¬ 
traction. Here “foreground” regions are identified by subtracting 
the input image from a background model. To achieve this, an 
accurate and adaptive background model is often desirable. Using 
the basic statistical approaches, a distribution for each pixel is 
modelled as a single Gaussian [10,11], and then any new pixel that 
does not belong to the distribution is stated as a foreground pixel. 
A Mixture of Gaussians was proposed in [12] to model the complex 
background processes of each pixel in a better way. The Mixture of 
Gaussians approach was also examined in [13]. Other statistical 
approaches based on standard background-subtraction technique 
and gradient information are deliberated in [14-16]. 

Researcher’s have proposed various methods of generating 
background models. In [17], to obtain the distribution of pixel 
intensities for background modelling, kernel density estimator 
was proposed. A variable-bandwidth kernel density estimator 
was proposed in [18]. The analysis of input video based on time 


is another technique used to create dynamic background models. 
Kalman filters for adaptive background were used in [19], and an 
auto-regressive model was used in [20]. Background subtraction 
model for dynamic scenes was proposed in [21 ]. An adaptive back¬ 
ground model for dynamic scenes using kernel density [22] was 
also examined. Wiener filters were employed in three stages (pixel, 
region and frame) in the Wallflower approach [23]. A background 
subtraction technique for moving cameras is examined in [24]. 

The advantage of employing background-subtraction for object 
detection is that the regions in motion are directly obtained. How¬ 
ever, these classes of methods have two debilitating drawbacks. 
First, these methods require the camera to be stationary, so that 
reliable background models can be built. This greatly limits the 
applicability of such methods. Secondly, these methods by them¬ 
selves do not discriminate between different object classes, and 
all foreground objects in the scene are detected. However, in appli¬ 
cations like perimeter monitoring, virtual fencing where the 
background is constant, computationally efficient background 
subtraction approaches are popular. 

The proposed approach is inspired from the contour based sta¬ 
tistical background subtraction method discussed in [16], which is 
used for human detection. It uses the “halo effect” produced by 
common ferroelectric BST sensors based infrared detector that ap¬ 
pears around very hot or cold objects present in the scene. The 
work has been extended for thermal video sequences of vehicles 
captured using micro bolometer type infrared detector. The ab¬ 
sence of halo effect which acted as an important feature to detect 
objects in case of [16] is not present in microbolometer type IR 
detector and the complex non-uniform thermal signature of vehi¬ 
cles pose great challenges and the method proposed in 16] when 
applied directly fails to achieve accurate results. To handle these is¬ 
sues, several modifications and extensions are proposed which are 
discussed in the paper. 


3. Proposed method 

The proposed approach shown in Fig. 1 can be broadly divided 
into three stages, namely-Foreground Object Detection based on 
Statistical Background Subtraction, Contour Detection uses Con¬ 
tour Saliency Map and Silhouette Generation. 

3.1. Foreground object detection (FOD) 


The first step of FOD is to generate a Statistical Background 
Model, which can represent the background accurately. This de¬ 
mands that the input thermal video contains an adequate number 
of initial frames without objects to generate a background model. 
To construct a statistical background model, firstly a reference 
background is generated from an initial set of N frames by taking 
the pixel wise median of each frame, which is named as median 
Image (I med ).The statistical background model for each pixel of 
the input set of N frames is formed by calculating the weighted 
mean {fi) and variance (cr 2 ) as described in Eqs. (1) and (2). 


M*,y) 


Eili W,(x,y)-I,(x,y) 

E lW(x,y) 


(1) 


<7 2 (x,y) 


YLw,(x,y)-(l t (x,y)-n(x,y)) 2 
^Em VV,(x,y) 


( 2 ) 


where J;(x,y) is the intensity of pixel located at (x,y) in the ith frame, 
the weights W,(x,y) for each pixel location (calculated using Eq. (3)) 
are used to minimize the effect of outliers. 


Wi(x,y) = exp 


/ (Ji(x,y)-J med (x,y)) 

\ -2 SD 2 


( 3 ) 
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INITIAL SET OF 



Fig. 1 . Block diagram of the proposed adaptive contour-based statistical background subtraction method. 


After experimental evaluation, the value of SD is taken to be 5. 
Fig. 2 shows the generated median, weighted mean and weighted 
variance images of a background frame for vehicle sequence (“In- 
nova”). After generating the mean/variance background model, 
the foreground object(s) for the given input frame (/) is obtained 
using the squared Mahalanobis distance, where any pixel greater 
than the threshold (distance) is taken as a foreground pixel as 
described in Eq. (4). 

f 1 (I(x,y)-M*,y)) 2 > t2 

F(x,y) = 1 ’ °m 2 (4) 

10, otherwise 

In this work, the value of T is set as T 2 = 81 after experimental 
evaluation. 

A connected component algorithm is used, in which all the 
components that are connected to each other is computed and la¬ 
belled as one entity. To segregate the obtained foreground objects 
from the background pixels basic a priori information of area and 
aspect ratio are used and clutter is discarded. Fig. 3 shows the out¬ 
put of the connected component algorithm and output after clutter 
rejection stage. 

3.2. Contour detection 

This stage involves extraction of the contours of the target from 
the detected foreground object. For detecting contours, the gradi¬ 
ent information of the input frame and its corresponding back¬ 
ground is used. Instead of simple input-background gradient 


difference magnitude, a contour saliency map (CSM) is computed 
to suppress large non-object gradient magnitudes and large non¬ 
object gradient difference magnitude of input and background. A 
CSM is computed by calculating the pixel-wise minimum of the 
normalized input gradient magnitudes and the normalized input- 
background gradient-difference magnitudes as described in Eq. (5). 


CSM = min 


H !!((/« 

Max ’ 


Max 



(5) 


where I x and I y are the gradient of input image in the horizontal and 
vertical directions respectively, B x and B y are the gradient of input 
image in the horizontal and vertical directions respectively, the nor¬ 
malization factor Max is the respective maximum magnitudes of in¬ 
put gradients and the input-background gradient-difference of the 
foreground object region. The value of pixels in the CSM is in the 
range of [0,1 ], with larger values indicating stronger confidence that 
a pixel belongs to the object boundary. The value of each pixel in 
CSM represents the confidence of that pixel belonging to the bound¬ 
ary of a foreground object. 

To ensure that the background aptly corresponds to the current 
frame, an adaptive background updation model is used to generate 
background frame corresponding to each input frame. In this work, 
an adaptive background frame is generated using a motion based 
background updation process to capture the naturally occurring 
temperature variations. Background of the current frame B n+ i(x, y) 
is generated using the information of the motion mask obtained 
from clutter rejecter stage, current frame J n+1 (x, y) and the previous 
background B n (x, y). Any pixel which is in motion is treated as 



Fig. 2. (a) The median image (/ me d).(b) weighted mean image (//), (c) weighted variance image (a 2 ). 
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Fig. 3. (a) Output of the connected component algorithm, (b) Output after clutter rejection stage. 


foreground pixel and is assigned the value of previous background 
and pixel that is not in motion is updated using the information from 
the previous background and current frame as described in Eq. (6). 

ocB„(x,y) + (1 - a)/„ + i (x,y), if(x,y)is non-moving 
B„(x,y), if (x,y)is moving 

( 6 ) 

where a is the weight parameter indicating the significance of pre¬ 
vious background while updating the background. 

To produce thin contours, called tCSM, a non maximum sup¬ 
pression of edges is performed using the canny operator. The next 
step is to convert the tCSM into a binarized tCSM image to choose 
the most significant contours. To achieve this, a single threshold 
value is required that selects the majority of the object’s contours 
and at the same time capable of eliminating the background noise 
contours. To select the threshold value, k-means clustering with 2 
clusters, representing targets and background are formed, the low¬ 
er cluster is discarded as background by setting the pixel value in 
this cluster to zero and the pixels corresponding to the higher clus¬ 
ter are taken as foreground pixels by setting the pixel to one. Fig. 4 
shows the output obtained after CSM, tCSM and binarized tCSM 
stages respectively. 

3.3. Silhouette generation 

The final step of target detection involves generation of target 
silhouettes from the contour image. The binarized contours ob¬ 
tained are mostly broken; therefore flood fill operation cannot be 
directly applied to obtain the silhouettes. In order to get the silhou¬ 
ettes of the detected target there is a need to close and complete 
the broken contours. Morphological dilation using diamond struc¬ 
turing element followed by a morphological closing using disk 
structuring element is applied to complete and close the broken 
binarized contours. The closed contours are finally flood filled to 
get the silhouettes of the detected target. Fig. 5 shows an input 
frame containing target and the corresponding silhouette of the 
target generated by the proposed method. 

4. Experimental data-‘CSIR-CSIO Moving Object Thermal 
Dataset’ 

The proposed approach is tested with six challenging infrared 
video sequences comprising of both vehicle and human sequences, 
four sequences consisting of different type of vehicles and two se¬ 
quences of humans. These sequences are part of ‘ CSIR-CSIO Moving 
Object Thermal Dataset ’ consisting a total of 18 thermal video se¬ 
quences of 640 x 480 pixel resolution captured using a micro 
bolometer type thermal image, at two temperature and weather 
conditions, i.e. around 11:00 am on a hot sunny day and 4:00 pm 
on a cool and light shower’s day. However, in this work only six 


infrared video sequences out of 18 were analysed in this work as 
the proposed method demands significant number of background 
frames for generation of the “Statistical Background Model” which 
are not available in other sequences. The total number of frames, 
maximum number of background frames in the sequence, the 
number of background frames used in the proposed method to 
generate the Statistical Background Model are listed in Table 1. 

5. Results and discussion 

To evaluate the performance of the proposed method, we com¬ 
pared it with the popular Mixture of Gaussians (GMM) [12] and 
Pseudo Wigner distribution and Renyi entropy based background 
subtraction (PWD-RE) [3] methods. The optimized GMM imple¬ 
mentation provided in the MATLAB® Computer Vision System tool¬ 
box is used in this work for comparative analysis. As can be seen 
from Fig. 6 in the GMM approach, a single region is split into multi¬ 
ple parts thereby depicting it as multiple objects. Moreover it has a 
number of false detections in the Tnnova’ sequence due to variation 
in atmospheric conditions and inherent thermal noise. It is evident 
from the qualitative comparison shown in Fig. 6 that the proposed 
method outperforms the GMM and it is able to extract the entire 
region containing the target as a single entity with no false detec¬ 
tions; this is due to the use of contour saliency information along 
with the conventional background subtraction. In case of the 
PWD-RE method, we were able to detect the objects without any 
false detection. However, it does not provide much detail about 
the shape and detects a much larger area than the actual object re¬ 
gion, which can be seen in Fig. 6. This is because the PWD-RE ap¬ 
proach focuses on high detection with low false alarm and not 
the exact shape or boundary of the object. On the contrary the pro¬ 
posed method inherently emphasises on the shape aspect by using 
the contour information, thereby providing a better silhouette of 
the object. 

To demonstrate the performance of the proposed method; eval¬ 
uation metrics, Sensitivity, PPV and F-measure are used. Sensitivity 
is an indicator of detection rate and PPV is an indicator of false 
alarms. A high Sensitivity value corresponds to a high detection 
rate, and a high PPV corresponds to a low false alarms. F-measure, 
the harmonic mean of Sensitivity and PPV is also computed to de¬ 
pict the overall performance of the method. Closer the F-measure 
value to unity better is the performance of the detection method 
in terms of both detection rate and false alarms. 

The development environment of the proposed method is MAT- 
LAB R2011a on an Intel Xeon CPU X5660 at 2.80 and 2.79 GHz (two 
processors) with 12 GB RAM. It is to be noted that the current 
implementation is not optimized for computational efficiency, this 
can be achieved by parallel processing and vectorization of the 
implementation. However, it was observed that even the non-opti- 
mized version of the proposed method is computationally efficient 
with an average processing time of 0.6766sec/frame. 
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Fig. 4. (a) Output of CSM, (b) Output of tCSM, (c) Output of binarized tCSM. 



Fig. 5. (a) Input Frame, (b) corresponding silhouette of the target generated by the proposed method. 


Table 1 

Description of the “CSIR-CSIO Moving Object Thermal Dataset”. 


Seq. No. 

Infrared video name 

Total frames 

No. of background (BG) frames in the video sequence 

No. of BG frames used for the statistical BG model 

1 

Ambassador_FarJ3ird 

185 

60 

30 

2 

Auto_Near 

126 

53 

35 

3 

Bike_ComplexJ3ackground 

136 

39 

26 

4 

Innova 

137 

29 

21 

5 

Person_Sack_Walking_Morning 

142 

1 

1 

6 

Person_Walking_Morning 

208 

14 

12 



Input Frame GMM [12] PWD-RE [3] Proposed Method 

Fig. 6. Qualitative comparison of detection results of GMM, PWD-RE and the proposed approach across different video sequences (Sequence No. 4-6, bottom-up). 


It can be observed from Table 2 that the proposed method has 
100% detection rate and low false alarm rate. For Ambassa¬ 
dorJar_bird sequence, both the two objects in motion, car and bird 
are detected considerably well despite of very less vehicle to back¬ 
ground contrast. In Autojnear and Irrnova sequences in spite of 


non-uniform intensity profile of the vehicle and partial occlusion 
of auto, the algorithm is able to clearly detect and generate the sil¬ 
houettes of the vehicle. It is also efficient in detecting small targets 
in a complex background, which was observed in the Bike_ 
complex_background sequence. The proposed method has 
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Table 2 

Detection results for the thermal video sequences. 


Thermal Video Sequence 

GT 

TP 

FP 

S = TP/GT 

PPV = TP/(TP + FP) 

F -measure 

Comp. Time (s/frame) 

Ambassador_far_bird 

106 

106 

0 

1 

1 

1 

0.5197 

Auto_near 

40 

40 

0 

1 

1 

1 

0.7576 

Bike_complex_background 

81 

81 

2 

1 

0.9759 

0.9878 

0.6532 

Innova 

88 

88 

0 

1 

1 

1 

0.6697 

Person_sack_walking_morning 

126 

126 

1 

1 

0.9921 

0.9960 

0.6840 

Person_walking_morning 

113 

113 

15 

1 

0.8928 

0.9433 

0.7753 

Total 

645 

645 

18 

1 

0.9768 

0.9895 

0.6766 


(Key: GT: Ground Truth, TP: True Positive, FP: False Positive, S: Sensitivity, PPV: Positive Predictive Value). 



Fig. 7. Output of the six infrared video sequences (Sequence 1-6 left-right), Input frame (top row) and the corresponding output target silhouette (below row). 


demonstrated accurate detection for pedestrian sequences as well. 
Few false positives occured in Person_walking_moming sequence, 
because of the thermal profile generated by the shadow of the per¬ 
son. The overall performance of the proposed method is high with 
an average sensitivity of 1 and average F-measure of 0.9895, which 
is fairly high for the chosen dataset containing a variety of targets. 
Fig. 7 shows the output of the proposed method for randomly cho¬ 
sen input frame of all the six sequences. 

6. Conclusion 

The proposed algorithm is tested to handle a variety of situa¬ 
tions like complex background, different environmental conditions 
and varying target to sensor distance. It is evident from the results 
that the proposed method is able to solve the problems posed by 
thermal imagery, Moreover the proposed method can effectively 
detect both uniform (humans) as well as non-uniform (vehicles) 
moving targets. High values of the F-measure obtained is an indic¬ 
ative of the robustness of the proposed method for target detection 
purposes. The performance of the proposed method can be further 
enhanced by incorporating efficient edge linking technique for 
accurate retrieval of the object boundary and also a more robust 
statistical background model considering higher order statistics. 
In conclusion, the authors propose a robust adaptive contour- 
based statistical background subtraction technique with a high 
detection probability and a low false-alarm probability for moving 
target detection in infrared image sequences. 
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